Research Article | Open Access
Volume 2025 |Article ID 100097 | https://doi.org/10.1016/j.plaphe.2025.100097

De-occlusion models and diffusion-based data augmentation for size estimation of on-plant oriental melons

Sungjay Kim,1 Xianghui Xin,1,2 Sang-Yeon Kim,1,3 Gyumin Kim,1,2 Min-gyu Baek,1,2 Do Yeon Won,4 Chang Hyeon Baek,4 Ghiseok Kim 1,2,3

1Department of Biosystems Engineering, Seoul National University, Seoul, 08826, the Republic of Korea
2Integrated Major in Global Smart Farm, Seoul National University, Seoul, 08826, the Republic of Korea
3Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, the Republic of Korea
4Seongju Korean Melon Fruit and Vegetable Research Institute, Seongju-gun, 40054, the Republic of Korea

Received 
20 Mar 2025
Accepted 
17 Aug 2025
Published
21 Aug 2025

Abstract

Accurate fruit size estimation is crucial for plant phenotyping, as it enables precise crop management and enhances agricultural productivity by providing essential data for growth and resource efficiency analysis. In this study, we estimated the size of on-plant oriental melons grown in a vertical cultivation system to address the challenges posed by leaf occlusion. Data augmentation was achieved using a diffusion model to generate synthetic leaves to cover existing fruits and create an enriched dataset. Three instance segmentation models–mask region-based convolutional neural network (CNN), Mask2Former, and detection transformer (DETR)–and six de-occlusion models derived from these architectures were implemented. These models successfully inferred both visible and occluded areas of the fruit. Notably, Amodal Mask2Former and occlusion-aware RCNN (ORCNN) achieved average precision scores of 85.92 % and 85.35 %, respectively. The inferred masks were used to estimate the height and diameter of the fruit, with Amodal Mask2Former yielding a mean absolute error of 5.46 mm and 4.20 mm and a mean absolute percentage error of 4.86 % and 5.33 %, respectively. The results indicate enhanced performance of the transformer-based Amodal Mask2Former over CNN architectures in de-occlusion tasks and size estimation. Finally, the enhancement in de-occlusion models compared to conventional models was assessed and demonstrated across occlusion ratios ranging from 0 to 70 %. However, generating synthetic datasets with occlusion ratios over 70 % remains a limitation.

© 2019-2023   Plant Phenomics. All rights Reserved.  ISSN 2643-6515.

Back to top